Main analysis

Questions tested

For this research, we try to answer the following questions:

1a. Does distance from a productive coast influence the intensity of a site’s occupation?
2b. Does distance from a productive coast influence the density of artifact found in a site?
2. Can we use shell density as an indicator of the importance of shellfish in prehistoric foragers’ diet?

Summary of methods

At the end of each run, two types of cells export information: cells that were occupied by a site at some point, and cells where shellfish was processed. The exported information is:

1. The kcal of shells, plants, and meat that was brought back to it every day it was occupied (or processed on it)
2. The number of hunting tools that were discarded on it    

To answer questions 1a and 1b, we ran linear and third degree polynomial regressions to evaluate the impact of cells’ distance from the coast on their occupation length and on the size of their artifact assemblages. We then categorized each cell by their coastal status (coastal if adjacent to the ocean, and inland elsewhere) and for each run, we compiled the average occupation length and the average number of discarded hunting tools on cells in each habitat (coastal vs. inland cells). To explore the yearly difference in occupation and artifact discard, we compared those averages in individual simulations using boxplots and non-parametric Wilcoxon signed-rank tests. To evaluate how continued occupation of one landscape would change its archaeological signature over longer time periods, we compiled the cells values of all 3200 runs, and here again compared their coastal vs. inland occupation and artifact medians using non-parametric Wilcoxon signed-rank tests.

To answer question 2, we calculated the percentage of the subsistence covered by shellfish in all simulations, then focused only on the cells with >95th percentile of cumulative occupation (the most occupied) and compiled the percentage again. We compared the average of those two percentages using Wilcoxon test again. We also focused on coastal cells among the most occupied cells, and computed the percentage contribution of shellfish on those. We tested the difference between this average and the average of all cells using Wilcoxon test. We then plotted the shellfish calories processed at each cell against the cell’s distance from coast to see if it fitted empirical observations (Jerardino 2016).

Variables used

Based on the results of the Sensitivity Analyses (see QI_Sensitivity_Analyses.Rmd), the following variables were used to create a representative dataset:

Variables Values
spatial-foresight TRUE
nrcamps 5, 15
daily-time-budget 10,12
hunter-percent 0.3
vision-forager 20
forager-movement local-patch-choice, random
nrforagers 60
days_of_foresight 5, 10
global-knowledge? TRUE
walk-speed 2, 3
point-recycling 25
point-hunting-rate 50
processing Threshold 20, 50

Each parameter combination was run 50 times. Each run lasted 365 time steps (days). This created a total of 3200 runs, each representing one year.

The following is a map of the covered region, with each biome’s highest productivity.

Question 1: Does distance from a productive coast influence the intensity of a site’s occupation?

Regressions

Yearly data

First, this linear regression considers each cell in each simulation as individual observations (i.e., this is yearly data). This includes all cells that were occupied by a camp at least once in a simulation.

## 
## Call:
## lm(formula = fmla, data = ds)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -53.3  -13.6  -12.1   -7.2  352.9 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)  13.0500     0.0758     172 <0.0000000000000002 ***
## distCoast     0.8665     0.0074     117 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 55.8 on 683344 degrees of freedom
## Multiple R-squared:  0.0197, Adjusted R-squared:  0.0197 
## F-statistic: 1.37e+04 on 1 and 683344 DF,  p-value: <0.0000000000000002

The graph shows that there is a lot of variability, which explains the very weak linear regression. It also suggests that the relationship may be non-linear, so we ran a polynomial regression on the dataset to see if it would improve the results. The results are marginally better, but this still suggests that the relationship is not strong between those two variables.

## 
## Call:
## lm(formula = fmla, data = ds)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -92.2  -16.1  -10.3   -6.5  355.5 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           17.0923     0.0673   254.1 <0.0000000000000002 ***
## poly(distCoast, 3)1 6537.4870    55.6079   117.6 <0.0000000000000002 ***
## poly(distCoast, 3)2 -754.9714    55.6079   -13.6 <0.0000000000000002 ***
## poly(distCoast, 3)3 3729.5058    55.6079    67.1 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 55.6 on 683342 degrees of freedom
## Multiple R-squared:  0.0264, Adjusted R-squared:  0.0264 
## F-statistic: 6.17e+03 on 3 and 683342 DF,  p-value: <0.0000000000000002

Palimpsest data

Then running the linear regression on the dataset when aggregated by cell (thus summing the length of occupation and artifact assemblage in all 3200 simulations for each cell). Note that the graph axes are logged to better see the data.

## 
## Call:
## lm(formula = fmla, data = ds)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -221   -180   -134    -12 214948 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)   222.85       9.95   22.40 <0.0000000000000002 ***
## distCoast      -4.83       0.55   -8.78 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1820 on 73837 degrees of freedom
## Multiple R-squared:  0.00104,    Adjusted R-squared:  0.00103 
## F-statistic:   77 on 1 and 73837 DF,  p-value: <0.0000000000000002

The results of the linear regression and the graph suggest that distance to coast does not linearly predict occupation length of a given cell, but that there is still a non-linear relationship, especially when considering the palimpsest created by all runs. A third degree polynomial regression does not improve the fit much, unfortunately, which is due too much variability in the data.

## 
## Call:
## lm(formula = fmla, data = ds)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -494   -193    -53      8 214675 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)            158.18       6.67    23.7 <0.0000000000000002 ***
## poly(distCoast, 3)1 -15945.75    1811.92    -8.8 <0.0000000000000002 ***
## poly(distCoast, 3)2  27313.97    1811.92    15.1 <0.0000000000000002 ***
## poly(distCoast, 3)3 -23736.66    1811.92   -13.1 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1810 on 73835 degrees of freedom
## Multiple R-squared:  0.00641,    Adjusted R-squared:  0.00637 
## F-statistic:  159 on 3 and 73835 DF,  p-value: <0.0000000000000002

Comparing coastal to non-coastal

We separated all cells into coastal vs. inland. Coastal cells are the ones with vegetation types 10 to 14 (TMS and Sandy Beach). They are the cells immediately adjacent to the ocean.

Then we compared the length of occupation and assemblage sizes in coastal vs. inland cells.

Yearly data

This first figure is for each site in each individual simulation (yearly dataset). The notches on the sides of the boxplots show the extent of the Confidence Interval around the median. Note that the y axis is on a log scale here for better visibility.

The p-value between the two medians is <0.0000000000000002.

Palimpsest data

This second figure is for the palimpsest data. Note that the y axis is on a log scale here for better visibility.

The p-value between the two medians is <0.0000000000000002.

The difference is significant at both time scales.

Therefore, the answer to questions 1 is: Distance from the coast is not a great predictor of occupation length as there is a lot of variability in the data, but coastal cells have longer occupations than non-coastal cells, especially at the long time-scale.

The impact of the coast is even more visible when plotting those values on a map. The first map shows the mean of the occupation lengths for each cell in all simulations, whereas the second map shows the sum of occupations. This shows that accumulation (palimpsest) and reoccupation of the same cells on the coast has a strong impact on cells’ length of occupation.

## [1] 366

## [1] 215170

Question 2: Does distance from a productive coast influence the density of artifact found in a site?

Regressions

Yearly data

First looking at individual simulations separately (yearly data). This dataset includes only the sites with at least one discarded hunting tool.

The following linear regression shows the impact of distance from the coast on a cell’s assemblage of discarded hunting tools.

## 
## Call:
## lm(formula = fmla, data = ds)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.167 -0.082 -0.079 -0.079  3.921 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept) 1.079038   0.002262  476.98 < 0.0000000000000002 ***
## distCoast   0.001845   0.000227    8.14  0.00000000000000042 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.317 on 23083 degrees of freedom
## Multiple R-squared:  0.00286,    Adjusted R-squared:  0.00282 
## F-statistic: 66.3 on 1 and 23083 DF,  p-value: 0.000000000000000416

While the graph suggests that there is a slight correlation between the values, the regression shows that it is not a linear relationship. Running a third degree polynomial improves the results slightly.

## 
## Call:
## lm(formula = fmla, data = ds)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.169 -0.084 -0.076 -0.076  3.924 
## 
## Coefficients:
##                     Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)          1.08612    0.00209  520.19 < 0.0000000000000002 ***
## poly(distCoast, 3)1  2.58282    0.31723    8.14  0.00000000000000041 ***
## poly(distCoast, 3)2 -0.96520    0.31723   -3.04               0.0023 ** 
## poly(distCoast, 3)3  0.62042    0.31723    1.96               0.0505 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.317 on 23081 degrees of freedom
## Multiple R-squared:  0.00343,    Adjusted R-squared:  0.0033 
## F-statistic: 26.5 on 3 and 23081 DF,  p-value: <0.0000000000000002

Palimpsest data

The following regression is on palimpsest data (here again, including only cells where at least one hutning tool was discarded). Note that the y axis is on a log scale here for better visibility.

## 
## Call:
## lm(formula = fmla, data = ds)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -5.9   -5.3   -3.9   -1.8  522.1 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)   6.8992     0.3387    20.4 <0.0000000000000002 ***
## distCoast    -0.2111     0.0206   -10.3 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.8 on 7400 degrees of freedom
## Multiple R-squared:  0.014,  Adjusted R-squared:  0.0139 
## F-statistic:  105 on 1 and 7400 DF,  p-value: <0.0000000000000002

Here we can see that, while the regression remains pretty weak (R2), the graph clearly shows that the bigger numbers are only in cells close to the ocean. A third degree polynomial regression improves the fit a little bit.

## 
## Call:
## lm(formula = fmla, data = ds)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -11.2   -6.7   -2.7    2.4  516.9 
## 
## Coefficients:
##                     Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)             4.73       0.26    18.2 <0.0000000000000002 ***
## poly(distCoast, 3)1  -233.79      22.37   -10.4 <0.0000000000000002 ***
## poly(distCoast, 3)2   261.27      22.37    11.7 <0.0000000000000002 ***
## poly(distCoast, 3)3  -258.45      22.37   -11.6 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.4 on 7398 degrees of freedom
## Multiple R-squared:  0.0487, Adjusted R-squared:  0.0483 
## F-statistic:  126 on 3 and 7398 DF,  p-value: <0.0000000000000002

Comparing coastal to non-coastal

Yearly data

Then we compared the assemblage sizes in coastal vs. inland cells, using the complete dataset of cells with at least one hunting tool. This first figure is for each site in each individual simulation (yearly values). Note that the y axis is on a log scale here for better visibility.

The p-value between the two medians is 0.015. This graph is difficult to read as most values are 1 for both regions.

Palimpsest data

The second graph is for palimpsest data. Note that the y axis is on a log scale here for better visibility.

The p-value between the two medians is <0.0000000000000002.

This shows more clearly the difference between assemblage size at coastal vs inland sites.

Here again, we can map the mean and summed size of assemblages per cell to see if the coastal vs. inland separation shows up.

## [1] 5

## [1] 520

These maps show clearly that the reoccupation of the same cells has an important impact on the size of the assemblage accumulated on those cells.

It is interesting to see that higher means are found in Sand Fynbos, where most of the hunting takes place, but that sums are higher on the coast.

The answer to question 2 is: Distance from the coast is a weak but significant predictor of the size of artifact assemblages. But, coastal cells do have a bigger assemblage than inland cells.

Question 3: Can we use shell density as an indicator of the importance of shellfish in prehistoric foragers’ diet?

For each simulation, we calculated the total kcal consumed and the total kcal from each food source. We then used those numbers to calculate the ratio of each food source in the diet.

All cells

The following table show the relative contribution of each food source in all cells and all simulations. This includes all cells (even the ones not occupied by a camp, but where shellfish processing occurred).

## # A tibble: 1 x 4
##   meanKcalPerSim percShell percMeat percPlant
##            <dbl>     <dbl>    <dbl>     <dbl>
## 1      58252020.      5.32     4.65      90.0

In all simulations, there is an average (mean) of 58252019.5403 calories consumed, and shell accounts on average (mean) for 5.32% of the diet.

Most occupied cells

Then, we focused on the cells that are most occupied the longest and recalculated this average shellfish contribution. Let’s assume that archaeological sites will be found only if they have a certain level of reoccupation. So, focusing on those reoccupied cells (with total length of occupation > 95th percentile), what is the shells kcal contribution we see? 8.704%.

## # A tibble: 1 x 3
##   percShell percMeat percPlant
##       <dbl>    <dbl>     <dbl>
## 1      8.70     4.82      86.5

A Wilcoxon test run on the difference between this sample of well-occupied cells and the whole landscape has a p-value of: <0.0000000000000002.

So, when we focus on the most occupied cells, the percentage of shell contribution increases significantly and provides a contribution that does not represent reality. Therefore, this shows that we have the potential to over-estimate the level of shellfish consumed by prehistoric people.

This pattern is even stronger when we focus on the most popular coastal sites only:

## # A tibble: 2 x 4
##   coastal     percShell percMeat percPlant
##   <chr>           <dbl>    <dbl>     <dbl>
## 1 Coastal         14.4      5.02      80.6
## 2 Non-coastal      1.83     4.58      93.6

Comparing the two averages (most reused coastal cells vs all cells) has the following p-value: <0.0000000000000002.

The answer to question 3 is: No, we cannot equate the shellfish density in coastal archaeological sites to their importance in prehistoric people’s diets, because coastal sites that are visited more often contain more shellfish refuse than what is the usual contribution of shellfish in the diet.

Visualizing the relationship between distance from coast and shellfish discard

We follow this with a few graphs showing the relationship between cells’ distance from coast and the amount of shellfish processed (and eaten, in most cases) at each cells.

Comparison with empirical data

We compare the graphs shown above to similar graphs using empirical data from South and West African archaeological sites (compiled by Jerardino 2016). Here is the similar graph based on Jerardino’s table 2.

Out of curiosity, we computed third degree polynomial regressions on this dataset.

## 
## Call:
## lm(formula = `MNI/m3 (average)` ~ poly(DistanceToShoreCorr, 3), 
##     data = jerardino, na.rm = T)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -10971  -3473   -614   2561  18075 
## 
## Coefficients:
##                               Estimate Std. Error t value   Pr(>|t|)    
## (Intercept)                      10313       1335    7.73 0.00000014 ***
## poly(DistanceToShoreCorr, 3)1   -15468       6794   -2.28     0.0334 *  
## poly(DistanceToShoreCorr, 3)2    -4558       6710   -0.68     0.5045    
## poly(DistanceToShoreCorr, 3)3    19843       6725    2.95     0.0076 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6660 on 21 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.412,  Adjusted R-squared:  0.328 
## F-statistic: 4.91 on 3 and 21 DF,  p-value: 0.00974
## 
## Call:
## lm(formula = `kg/m3 average` ~ poly(DistanceToShoreCorr, 3), 
##     data = jerardino, na.rm = T)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -143.2  -79.4  -12.7   71.1  166.9 
## 
## Coefficients:
##                               Estimate Std. Error t value   Pr(>|t|)    
## (Intercept)                      138.2       19.4    7.13 0.00000037 ***
## poly(DistanceToShoreCorr, 3)1   -206.5       98.8   -2.09      0.048 *  
## poly(DistanceToShoreCorr, 3)2    -52.3       98.8   -0.53      0.602    
## poly(DistanceToShoreCorr, 3)3    278.1       98.8    2.82      0.010 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 98.8 on 22 degrees of freedom
## Multiple R-squared:  0.364,  Adjusted R-squared:  0.277 
## F-statistic: 4.19 on 3 and 22 DF,  p-value: 0.0173

The results show fair relationships between the distance from coast and shellfish abundance proxies, except that the second relationship is not statistically significant.

THE END